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(57) ABSTRACT 

A processor-based system may utilize a remote control unit 
which not only allows mouse input commands to be pro- 
vided to the processor-based system but also includes a 
microphone and a speech engine for decoding spoken com- 
mands and providing code for presenting the commands to 
the processor- based unit. Trie processor-based system may 
provide information to the remote control unit about the 
vocabulary currently being used by applications active on 
the processor-based system. This allows the speech engine 
in the remote control unit to focus on a more limited 
vocabulary, increasing the accuracy of the speech recogni- 
tion function and decreasing the capabilities necessary in the 
remote control unit based speech engine. 

28 Claims, 6 Drawing Sheets 
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REMOTE CONTROL WITH SPEECH 
RECOGNITION 

Background 

This invention relates generally to speech recognition and 
particularly to the control of computer software using spo- 
ken commands. 

Currently available speech recognition software recog- 
nizes discrete spoken words or phonemes contained within 
words in order to identify spoken commands. The process- 
ing of the spoken commands is usually accomplished using 
what is known as a speech engine. Regardless of whether 
discrete terms or phonemes are utilized, the speech engine is 
called by the application program which needs the speech 
recognition service. 

Operating systems may include Application Program 
Interface (API) software utilities which provide speech 
recognition. An application may incorporate a call to the 
speech API or the speech recognition may be supplied 
externally by a second application that intercepts the speech 
and feeds the first application simulated keys or commands 
based on the speech input information. 

Speech recognition technology has been applied to con- 
trolling processor-based systems including desktop com- 
puter systems. A variety of different speech recognition 
software is available, some of which comes with a micro- 
phone which may be worn by the user. Apparently, the idea 
is that extraneous sounds around the system, such as the 
system cooling fan may disrupt the speech recognition 
quality. The microphone feeds into a sound port, usually on 
the back of the processor-based system. The use of the 
microphone allows the speech recognition engine to process 
the sounds less influenced by surrounding noise. 

However, there is a continuing need for better ways to 
implement speech recognition services for processor-based 
systems. 

SUMMARY 

In accordance with one aspect, a processor-based system 
includes a first processor-based device having an airwave 
communication transceiver. A remote control unit has an 
airwave communication transceiver to communicate with 
the first processor-based device. The remote control unit 
includes a speech engine and a microphone coupled to the 
speech engine. 

Other aspects are set forth in the accompanying detailed 
description and claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a front elevational view of a remotely controlled 
processor-based system; 

FIG. 2 is a block diagram of a speech recognition system; 

FIGS. 3-5 are flow diagrams for the speech recognition 
system shown in FIG. 1; 

FIG. 6 is a schematic view of a computer display with two 
active windows; 

FIG. 7 is a flow diagram of a program in accordance with 
one embodiment; and 

FIG. 8 is a block diagram of a hardware system for use 
with the speech recognition system. 

DETAILED DESCRIPTION 

Referring to FIG. 1, a processor-based system 130, illus- 
trated as a set top computer system, includes a processor- 
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based unit 110 which sits atop a television receiver 112. The 
television receiver and the processor-based unit 110 may be 
controlled by a remote control unit 114. The remote control 
unit may communicate through its own transceiver 118 with 

5 a transceiver 134 on the processor-based unit 110 and a 
transceiver 128 on the television receiver 112. The commu- 
nications between the remote control 114 unit and the 
television receiver/processor-based unit may use any of a 
variety of airwave communications including infrared, ultra- 

10 sonic or radiowave signaling: 

While the present invention has been illustrated in con- 
nection with a set top computer system, those skilled in the 
art will appreciate that the present application is also appli- 
cable to any of a variety of other processor-based systems 

35 including desktop computers, laptop computers and a vari- 
ety of other processor-based appliances. 

The remote control unit (RCU) 114 includes a micro- 
phone 126. It also includes a cursor control system 116 
which operates essentially like a mouse. The RCU 114 

20 includes a mouse button 122 and a plurality of cursor 
direction control buttons 120. Thus, the position of a cursor 
or highlighting on a screen 132 may be controlled by 
operating one of the four directional control buttons 120. 
When the desired icon is indicated on the screen 132, it may 

25 be selected by operating the button 122. The remote control 
unit 114 may also include a numerical keypad 124. 

Referring to FIG. 2, a speech recognition system 11, 
operating on the RCU 114, works with an application 

3q software program 10, running on the processor-based unit 
110 which needs to respond to spoken commands. For 
example, the application 10 may be implemented through 
various graphical user interfaces or windows in association 
with the Windows® operating system. Those windows may 

35 call for user selection of various tasks or control inputs. The 
application 10 may respond either to spoken commands or 
tactile input commands. Tactile input commands may 
include pushing a keyboard key, touching a display screen, 
or mouse clicking on a visual interface, using the RCU 114. 

40 The application 10 communicates with a server 12. In an 
object oriented programming language, the server 12 could 
be a container. In the illustrated embodiment, the server 12 
communicates with the control 14 which could be an object 
or an ActiveX control, for example. The control 14 also 

45 communicates directly with the application 10. 

The server 12 can call the speech recognition engine 16. 
At the same time, a driver 18 can provide input signals to the 
server 12 and the control 14. Thus, in some embodiments, 
the control 14 can receive either spoken or tactile inputs 

50 (from the driver 18) and acts in response to each type of 
input command in essentially the same way. 

Referring to FIG. 3, a program for recognizing speech 
may involve beginning an application (block 90) on the 
processor-based unit 110 that needs speech recognition 

55 services. The speech engine is provided with a vocabulary of 
command sets for an active screen or task, as indicated in 
block 92. The command sets could be the vocabulary for 
each of the various applications that are implemented by the 
particular computer system or by a particular application 

60 program. The command set for the current application that 
is currently running is communicated to the server 12 or 
control 14 (block 94). Next, the speech is recognized and 
appropriate actions are taken, as indicated in block 96. 
Another implementation, shown in FIG. 4, also begins 

65 with starting an application, as indicated in block 98. Speech 
units that need to be decoded are associated with identifiers 
(block 100). The identifiers may then be associated with a 
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particular action to be taken in the application in response to engine provides the phrase to the server for example, as a 

the spoken command (block 102). Next, the flow determines text message. The container does a table look-up (block 32). 

the identifier for a particular spoken speech unit (block 104). On a match between the phrase and the identifier, the server 

The identifier is provided to a software object such as the 12 may call the OnMnemonic method of the IOleControl 

control 14, as indicated in block 106. An event is fired when 5 interface, passing the identifier to the control. The control 

the object receives the command, as shown in block 108. follows its preprogrammed rules and implements the corre- 

The event may be fired by the object whether the command sponding action (block 34). The control may handle the 

is a result of a spoken command or a tactilely generated message internally or send an event to the server, 

command. ^ a s i mp i e example, a given screen may include two 

Referring to FIG. 5, the application 10 passes a grammar 30 buttons, "ok" and "delete". When the application comes up 

table to the server 12 (block 20). In particular, the applica- it sends the grammar for this screen to the server. For 

tion initializes the grammar with speech identifiers associ- example, the grammar for "ok" might include "ok", "right" 

ated. with each spoken command used in the application. and "correct". 

These commands make up all of the command sets for a ^ app i icalion then assoc i ates " ok " with an identifier 

given engine. The grammar is a set of commands that may is which corresponds t0 a particu i ar co mrol and does ^ same 

include alternative phrases. For example, a simple grammar thin with <. delete .. idemifier b si , , inter or 

could be (start/beginXgame X This grammar would handle lhat j, uni within the applicatio t0 the ticular 

respond to the spoken commands "start game X and "begin CO m man d. The table 36 then includes the phrases "ok" and 

game A . "delete", an identifier for each phrase and an identifier for 

The speech recognition engine 16 can operate on pho- 20 the control that handles the command, 

nemes or with discrete terms. Thus, the application provides when , contro , is mstantiated> the application provides it 

the particular command set (which is a subset of the engine s ^ i(s identifier The fc mm £ ^ the 

available commands) with the active application. This fad,- action it ^ take when the advise * he that its 

tates speech recognition because the speech recognition ; d6ntifier has be6n caJled 

engine can be advised of the particular words (command set) 25 , . . 

that are likely to be used in the particular application that is ^ten a s P eaker £ es a word ' the , «P*ch engine sends the 

running. Thus, the speech recognition engine only needs to ?? rd to «»» server. Hie server checks the phrases in its lab e 

match the spoken words with a smaller sub-vocabulary. For 36 ,0 ' W ° r 15 ' n f actlve L ^ . In . ,he slm P le 

example, if the game x function was operating, only the exam P e ' ^, the w ° rd *? "P 0 "* en S me 18 not ok f 

command set of words associated with that application need 3 ° 0r delele ' • « discarded. This would indicate a speech 

be decoded engine error. If there is a match between the word and the 

, ' , . . . active vocabulary, the server sends the appropriate control 

rh, 1°™°% ^ P m i I T'S eDg M 6 « identifier to the appropriate control, which Then acts accord- 

( a } J 3 P b ™ e . Md 1 '*f fi « table I 6 ing to its programmed instructions, 

as indicated in FIG. 2. The application 10 also sends the , 

speech identifiers associated with given spoken commands 35 A P honeme based s P eech en S me Wlth a lar Se vocabulary 

to the control 14 or server 12 (block 24). When the control c c m be used ™ th hl S h rehablllt y because the engine is 

14 is activated in the container or server, the control may call focused on a hmi ! ed vocabulary at any given time. Advan- 

the onControlInfoahanged method in the IOleControlSite ta S eousI y this limited vocabulary may be less than 20 words 

interface, in an embodiment using ActiveX controls. This in lhe table 36 at anv ^ ven instance - 

provides for transfer of information from the control 14 to 40 Thk frees the application from having to keep track of the 

the server 12 (block 26). The server in turn may call the active vocabulary. The application can tell the server which 

GetControlInfo method from the IoleControl interface words to watch for at a given instance based on the active 

which allows communications from the server or container task's vocabulary. 

12 to the control 14 (block 28). There may also be a global vocabulary that is always 

The server uses the GetControlInfo method in the IOle- 45 available regardless of the active screen. For example, there 

Control interface and the OnMnemonic method in IOleCon- mav a "Jump" command to switch screens or an "Off" 

trol to request identifiers from the control. The control may command to terminate the active task, 

provide this information through IOleControlSite interface Advantageously, the existing mnemonics or "hot keys" 

and the OnControlInfoChanged method, using ActiveX 5Q available in Microsoft Windows® may be used to imple- 

technology for example. ment speech recognition. For example, the OnMnemonic 

The server 12 enables the speech engine 16 (block 30), for method may be given the new function of passing informa- 

any commands that are active, from the server's table 36. lion from the server to the control corresponding to a spoken 

The server uses the table 36 from the application to provide command. 

focus in particular applications. The control provides an 55 While the methodology is described in connection with an 
effect comparable to that of an accelerator key. Namely, it ActiveX control, other object oriented programming tech- 
provides a function that can be invoked from any window or nologies may be used as well including, for example, 
frame reference. The application provides the speech iden- Javabeans and COM. In addition, still other such techniques 
tifiers and associates the identifiers with an actionby the may be developed in the future. 

contro *' 60 With embodiments of the present invention, an effect 

The server knows which vocabulary to use based on what comparable to that of an accelerator key is provided. It gives 

task is running currently. In a system using windows this a focus to the command with reference to a particular 

would correspond to the active screen. Thus, if the navigator application. Therefore, speech can be used to focus between 

is running, the server knows what the sub-vocabulary is that two operating tasks. For example, as shown in FIG. 6, if two 

must be recognized by the speech engine. 65 windows A and B are open at the same time on the screen 

When the server receives a speech message, it calls the 76, the command that is spoken can be recognized as being 

speech API in the engine 16. When a phrase is detected, the associated with one of the two active task windows or 
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frames. Referring to FIG. 7, after a command is recognized microphone 126. The memory 158a may conveniently be 

(block 78), the application provides information about what implemented by a flash memory. The memory 158a stores 

is the primary, currently operating task and the speech may the software 64 (FIG. 4), 66 (FIG. 5) and 68 (FIG, 7) for 

be associated with that particular task to provide focus implementing the speech recognition features, 
(block 80). An input is then provided to one of the tasks (and 5 There are a number of advantages inherent in using the 

not the other), as indicated at block 82. The speech recog- RCU 114 to implement speech recognition functions. First 

nition is accomplished in a way which is effectively invisible of a ^> by placing the microphone 126 in the RCU 114, the 

to the application. To the application, it seems as though the speech capture may be achieved closer to the speech source, 

operating system is effectively doing the speech recognition mav remove sources of ambient noise including those 
function. The synchronization is reduced. 10 associaled with the cooling fan or the processor-based unit 

The message which is passed to the ActiveX control from * t_ t_ . t ..„.., 

the container can include a field which allows the applica- vX?^? 1 ^ u"? ?P? ich - re ?W u,,on functlons ln tl * 

tion to know if the command was speech generated Uris *%L± }*L Lr,r h .h communication path 

. A1 f . i_ ■* • j • j . • between the RCU 114 and the processor-based unit 110 may 

may be useful, for example, when it is desired to given a be diminishecL Namel b enabli an application mnnin g 

spoken response to a spoken command. Otherwise, the is on the processo^ased unit 110 to communicate information 

application is basically oblivious to whether or not the which aUows a Umited set of information to be utilized in the 

command was speech generated or tactilely generated. RCU m tfae RCU may recognize the speech and pf0vide 

While the application loads the identifiers into the a relatively limited bandwidth consuming input command 

ActiveX controls (when they are instantiated), the controls over trie infrared link to the processor-based unit 110. Since 
and the container handle all of the speech recognition for the 20 the processor-based unit 110 can convey information to the 

command words. The control and its container are respon- rcu 114 about what command set to expect, a relatively 

sible for managing when the words are valid and for sending small vocabulary speech engine may be implemented in the 

appropnate messages to the application. Thus, the container RC U 114 without requiring substantial processor capabili- 

or server does all the communication with the speech ties. 

recognition API. The container may communicate with the 25 For examp]e) since the RCU 114 fa battery-based, it is 

ActiveX controls by standard interfaces such as IOleCon- desirable to minimize the power usage in the RCU 114. By 

trol. As a result, the number of state errors that would im pi eme nting the system described above, for example 

otherwise occur if the application were forced to handle the vsing flasn memory on the RC1J 114> a low r { , e _ 

speech recognition itself. mentation may be operated. 

Referring next to FIG. 8, a hardware implementation for While lne presem inventioa has beeQ described with 

the embodiment shown in FIG. 1 includes a processor 150. respect t0 a ^ited number of embodiments, those skilled in 

In one embodiment, the processor may be coupled to an the art wiU appreciate numerous modifications and varia- 

accelerated graphics port (AGP) (see Accelerated Graphics tions therefrom. It is intended that the appended claims 

Port Interface Specification, Rev. 1.0, published Jul. 31, cover aU such modifications and variations as faU within the 

1996 by Intel Corporation, Santa Clara, Calif.) chipset 152 true spirit and SCO p e of this present invention, 

for implementing an accelerated graphics port embodiment. What is claimed is* 

The chipset 152 communicates with the AGP port 154 and L A proce ssor-based system comprising: 

the graphics accelerator 156^ The television 112 may be a ^ processor . based dev ice having an airwave commu- 

coupled to the video output of the graphics accelerator 156. nication transcei 

The chipset 152 accommodates the system memory 158. . , , . . . 
_ , . ^ „ . , a remote control unit having an airwave communication 
The chipset 152 is also coupled to a bus 160. The bus 160 transceiver to communicate with said first processor- 
couples a television tuner/capture card 162 which is coupled based device? said remote control unit including a 
to an antenna 164 or other video input port, such as a cable speech en ^ ne and , microphone coupled to said speech 
input port, a satellite receiver/antenna or the like. The 45 engine- and 

television tuner/capture card selects a desired television l-'-j^* ujj -ij ^ 
, r i- ^ wherein said first processor-based device includes soft- 
channel and also performs a video capture function. One _ ^ r • , ; nfnmrit : nn trt „ ta _ . •* 

. .j r , « . i roirn ttt r> . ware to provide information to the remote control unit 

exemplary video capture card is the ISVR-III Video Capture ^ J a licalion which is currentl ^ and the 

Card available from Intel Corporation. vocabulary used by the application/and said speech 
The bus 160 is also coupled to a bridge 166 which may 50 engine 5eing prog rammed to utilize a spoken command 
couple a storage device such as a hard disk drive 168 or a and provide code corresponding to said spoken corn- 
flash memory. The drive 168 may store the software 62 mand lhrough ^id remote control unit transceiver to 
(FIG. 3). The bridge 166 is also coupled to another bus 170. said first pr0 cessor-based device. 
The bus 170 may in turn be coupled to a serial input/output 2 . The system of claim 1 wherein said system is a set top 
(SIO) device 172. The device 172 is coupled to an infrared S5 computer. 

interface 134. Also connected to the bus 170 is a basic 3 , The system of claim 1 wherein said first processor- 
input/output system (BIOS) 174. based device inc i ud es an interface to provide information 

The IR interface 134 may communicate using infrared about the currently active application running on said first 
signals with an IR interface 118 on the RCU 114. Any of a processor-based device to said remote control unit, 
variety of protocols may be utilized for implementing IR 6 o 4. The system of claim 3 wherein said first processor- 
communications. In addition, other forms of airwave com- based device includes software to provide a vocabulary set 
munications may be utilized as well. t 0 the speech engine in the remote control unit. 

The IR interface 118 on the RCU 114 communicates with 5. The system of claim 1 wherein said communication 

a controller 150a which may be a processor such as a digital links are infrared based. 

signal processor. The controller 150a communicates with 65 6. The system of claim 1 including a driver that can 

the keypad 116 on the RCU 114 and the memory 158a. The receive tactile or spoken commands that are recognized by 

controller 150a also receives spoken commands through the the remote control unit. 
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7. The system of claim 1 wherein said remote control unit 
is battery-powered. 

8. The system of claim 1 wherein said remote control unit 
transmits code over said transceiver to said first processor- 
based device corresponding to spoken commands received 5 
at said remote control unit through said microphone. 

9. A remote control unit comprising: 

a processor implementimg a speech engine; 

a microphone coupled to said processor; 

an airwave transceiver to communicate with a remote 
device; and wherein said speech engine is configured to 
operate on a limited vocabulary using information 
supplied from the remote device, indicative of the 
expected spoken command. ^ 

10. The remote control unit of claim 9 further including a 
set of mouse controls. 

11. The remote control unit of claim 9 wherein said 
transceiver is an infrared transceiver. 

12. The remote control unit of claim 9 wherein said 
processor includes software to send commands to the remote 
device when a spoken command is recognized by said 
speech engine. 

13. A method comprising: 

identifying an application currently active on a first 2 s 

processor-based device; 
conveying information about the commands associated 

with said application to a second processor-based 

device; 

receiving spoken commands at said second processor- 30 
based device using information from said first 
processor-based device to recognize said command; 
and 

transmitting information from said second processor- 
based device to said first processor-based device based 35 
on the recognition of said spoken command. 

14. The method of claim 13 further including communi- 
cating between said devices using an airwave communica- 
tion technique, 

15. The method of claim 14 further including sending 40 
signals between said first and second processor-based 
devices using infrared signals. 

16. The method of claim 13 further including responding 
to both spoken and tactilely generated input commands. 

17. The method of claim 13 wherein receiving spoken 45 
commands at said second processor-based device includes 
receiving said commands through a microphone in said 
second processor-based device. 



18. The method of claim 13 including operating said 
second processor-based device using battery power. 

19. The method of claim 13 further including transmitting 
mouse command input signals from said second processor- 
based device to said first processor-based device. 

20. The method of claim 13 including receiving said 
spoken commands through a remote control unit. 

21. An article comprising a medium for storing instruc- 
tions that cause a processor-based system to: 

receive a spoken command; 

use a vocabulary received from a remote device to rec- 
ognize the spoken command; and 

transmit information to said remote device based on the 
recognition of said spoken command. 

22. The article of claim 21 further storing instructions that 
cause a processor-based system to recognize mouse input 
commands and to transmit information about said input 
commands to a remote device. 

23. The article of claim 21 further storing instructions that 
cause a processor-based system to receive a vocabulary 
related to an application receiving on said remote device. 

24. An article comprising a medium for storing instruc- 
tions that cause a processor-based system to: 

identify an application currently active on said processor- 
based system; 

convey information about the commands associated with 
said application to a remote processor-based device; 
and 

receive information from said remote processor-based 
device based on the recognition of said spoken com- 
mand. 

25. The article of claim 24 further storing instructions that 
cause a processor-based system to communicate with said 
remote device using an airwave communication technique. 

26. The article of claim 24 further storing instructions that 
cause a processor-based system to respond to both spoken 
and tactilely generated input commands, 

27. The article of claim 24 further storing iastructions that 
cause a processor-based system to receive mouse command 
input signals from the remote processor-based device. 

28. The article of claim 24 further storing instructions that 
cause a processor-based system to transmit a portion of a 
total vocabulary to the remote device based on the currently 
active application on said processor-based system. 
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